Marginal median SOM for document organization and retrieval
نویسندگان
چکیده
The self-organizing map algorithm has been used successfully in document organization. We now propose using the same algorithm for document retrieval. Moreover, we test the performance of the self-organizing map by replacing the linear Least Mean Squares adaptation rule with the marginal median. We present two implementations of the latter variant of the self-organizing map by either quantifying the real valued feature vectors to integer valued ones or not. Experiments performed using both implementations demonstrate a superior performance against the self-organizing map based method in terms of the number of training iterations needed so that the mean square error (i.e. the average distortion) drops to the e(-1) = 36.788% of its initial value. Furthermore, the performance of a document organization and retrieval system employing the self-organizing map architecture and its variant is assessed using the average recall-precision curves evaluated on two corpora; the first comprises of manually selected web pages over the Internet having touristic content and the second one is the Reuters-21578, Distribution 1.0.
منابع مشابه
A combination of Wilcoxon test and R-estimates for document organization and retrieval
The Wilcoxon signed-rank test is exploited for document organization and retrieval in this paper. A novel modeling method for documents and a distance metric between documents are proposed. Both document modeling and document comparisons are based on signed-ranks and are applied to the frequency of occurrence of the document bigrams. A metric using the Wilcoxon signed-rank test exploits these s...
متن کاملOn the Variants of the Self-Organizing Map That Are Based on Order Statistics
Two well-known variants of the self-organizing map (SOM) that are based on order statistics are the marginal median SOM and the vector median SOM. In the past, their efficiency was demonstrated for color image quantization. In this paper, we employ the well-known IRIS data set and we assess their performance with respect to the accuracy, the average over all neurons mean squared error between t...
متن کاملAn Intelligent Tool for Building E-Learning Contend- Material Using Natural Language in Digital Libraries
In this paper is developed an intelligent searching tool using the Self-Organizing Map (SOM) algorithm, as a prototype e-content retrieval tool. The proposed searching tool has the ability to adjust and scale into any e-learning platform that requires concept-based queries. The SOM algorithm has been used successfully for the document organization as well as for document retrieval. In the propo...
متن کاملDocument Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps
Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive twodimensional format. Document topics are inferred usin...
متن کاملSOM-based Document Image Retrieval
In this paper we discuss some applications of word image clustering (based on Self Organizing Maps, SOM) for tasks related to document image retrieval. Two main applications are discussed: document retrieval and word retrieval. In document retrieval a document representation based on the vector model is obtained by computing the occurrences of words belonging to the SOM clusters in each documen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neural networks : the official journal of the International Neural Network Society
دوره 17 3 شماره
صفحات -
تاریخ انتشار 2004